Multiple ANOVA in R

What if I have many variables to compare?

lruolin
07-29-2021

I was looking at a dateset at work, and was wondering how I can carry out t-test to check if there were any significant difference in flavor compounds between two different species of the same fruit.

Previously, I looked at how to carry out t-tests efficiently, using rstatix package.

For ANOVA, I can use purrr to carry out iterative statistical testing. If I want to look at differences in chemical groups for 3 different types of stinky tofu, the strategy would be to calculate the % area for each group (analysed as triplicates), and then nest by the chemical group, carry out anova test, and then create a new column to show p-values.

I used fictitious data, but is based on what I saw from: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6321173/

Load Packages

Load dataset

tofu <- 
  tibble::tribble(
          ~group_no,             ~groups,   ~a1,   ~a2,   ~a3,   ~b1,   ~b2,  ~b3,  ~c1,   ~c2,  ~c3,
                 1L,           "phenols", 23.32, 23.56, 22.97, 47.87,  46.6, 46.9, 9.93, 10.21, 9.79,
                 2L,            "ethers",  0.02,  0.03,  0.01,  0.99,  0.94, 0.97, 14.3,  14.4, 14.3,
                 3L, "aldehydes_ketones",  0.21,  0.24,  0.25,  0.02,  0.02, 0.01, 0.11,  0.12, 0.11,
                 4L,           "indoles", 22.99,  22.9, 22.93,  35.2, 35.22, 35.9,  5.5,   5.3,  5.4
          )


tofu
# A tibble: 4 × 11
  group_no groups         a1    a2    a3    b1    b2    b3    c1    c2
     <int> <chr>       <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1        1 phenols     23.3  23.6  23.0  47.9  46.6  46.9   9.93 10.2 
2        2 ethers       0.02  0.03  0.01  0.99  0.94  0.97 14.3  14.4 
3        3 aldehydes_…  0.21  0.24  0.25  0.02  0.02  0.01  0.11  0.12
4        4 indoles     23.0  22.9  22.9  35.2  35.2  35.9   5.5   5.3 
# … with 1 more variable: c3 <dbl>

Visualization

tofu_long <- tofu %>% 
  pivot_longer(cols = a1:c3,
               names_to = "sample",
               values_to = "percent_area") %>% 
  mutate(sample_cleaned = fct_recode(sample,
                                     a = "a1",
                                     a = "a2",
                                     a = "a3",
                                     b = "b1",
                                     b = "b2",
                                     b = "b3",
                                     c = "c1",
                                     c = "c2",
                                     c = "c3"))

tofu_long%>% 
  group_by(sample_cleaned, groups) %>% 
  summarise(av_percent_area = mean(percent_area)) %>% 
  ggplot(aes(sample_cleaned, av_percent_area)) +
  geom_boxplot(aes(fill = sample_cleaned)) +
  facet_wrap( ~ groups, scales = "free") +
  labs(x = "") +
  theme_classic() +
  theme(legend.position = "none")

Anova

library(broom)

results_anova <- tofu_long %>% 
  nest(-group_no) %>% 
  mutate(model = map(data, ~aov(percent_area ~ sample_cleaned, .)),
         tidy = map(model, tidy)) %>% 
  select(group_no, tidy) %>% 
  unnest(cols = c(tidy)) %>% 
  filter(term != "Residuals")

results_anova
# A tibble: 4 × 7
  group_no term              df     sumsq    meansq statistic  p.value
     <int> <chr>          <dbl>     <dbl>     <dbl>     <dbl>    <dbl>
1        1 sample_cleaned     2 2125.     1063.         5550. 1.58e-10
2        2 sample_cleaned     2  384.      192.       141800. 9.47e-15
3        3 sample_cleaned     2    0.0707    0.0353      212. 2.71e- 6
4        4 sample_cleaned     2 1366.      683.        11992. 1.56e-11
results_anova %>% 
  filter(p.value<0.05)
# A tibble: 4 × 7
  group_no term              df     sumsq    meansq statistic  p.value
     <int> <chr>          <dbl>     <dbl>     <dbl>     <dbl>    <dbl>
1        1 sample_cleaned     2 2125.     1063.         5550. 1.58e-10
2        2 sample_cleaned     2  384.      192.       141800. 9.47e-15
3        3 sample_cleaned     2    0.0707    0.0353      212. 2.71e- 6
4        4 sample_cleaned     2 1366.      683.        11992. 1.56e-11

From the analysis, all three compounds differed significantly for all four chemical groups. This is a efficient way of carrying out ANOVA test.

The analysis strategy is to nest, mutate, unnest, to efficiently extract the p.values. The Tukey’s post-hoc comparison can be carried out in a similar way.

References

http://ritsokiguess.site/docs/2018/01/30/tidy-simple-effects-in-analysis-of-variance/ https://towardsdatascience.com/anova-in-r-4a4a4edc9448

Citation

For attribution, please cite this work as

lruolin (2021, July 29). pRactice corner: Multiple ANOVA in R. Retrieved from https://lruolin.github.io/myBlog/posts/20210729 Multiple ANOVA in R for different variables/

BibTeX citation

@misc{lruolin2021multiple,
  author = {lruolin, },
  title = {pRactice corner: Multiple ANOVA in R},
  url = {https://lruolin.github.io/myBlog/posts/20210729 Multiple ANOVA in R for different variables/},
  year = {2021}
}